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VIDEO ENCODING 



FIELD OF THE INVENTION 

The present invention relates to video encoding, and in particular to a method for use in 
5 encoding video data, a video encoding module and a video encoder. 

BACKGROUND 

Bit rate control is important in video coding and related applications and is normally 
achieved by selecting an appropriate quantization matrix or quantizer to encode the picture. 
An 8x8 quantizer is an 8x8 matrix of mteger valued step sizes that is used to divide video 
10 data m the form of 8x8 matrices of frequency coefficients produced by the discrete cosme 
transformation (OCT) of input video frames, and tiiereby reduce the amount of encoded 
video data. In the case of the digital video (DV) standard, as described in Specifications of 
Consumer-Use Digital VCRs using 6.3mm magnetic tape, HD Digital VCR Conference, 
December 1994, the quantizer is determined by three numeric parameters knovm as area, 
1 5 class and quantization number. The area and class are integers between 0 and 3, inclusive. 
For a given pixel, the area number for that pixel is determined by the pixel's position in an 
8x8 pixel block. A class number is assigned to each 8x8 pixel block on tiie basis of the 
block's content, for example, quantization noise and the maximum absolute value of the 
block's AC coefficients. The quantization number or step is an mteger that specifies the 
20 degree of image quantization, and is assigned to a macroblock consisting of four 
luminance 8x8 pixel blocks and two chrominance 8x8 pixel blocks. The combination of 
the class and quantization number determines a quantization vector comprising four 
quantizer coefficients, one for each area number. An 8x8 quantizer is constructed from the 
quantization vector by entering each coefficient into corresponding positions in the 
25 quantizer, according to area number. Hie resulting quantizer detennines the quantity of 
output video data generated from a given macroblock of input video data. A video 
segment, consisting of five macroblocks, is encoded within a constant bit budget by 
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selecting a suitable quantization vector for each macroblock to provide a bit rate of the 
encoded video data is as close as possible to a constant target value. 

US Patent 5,677,734, Method and Apparatus for Modifying the Quantization Step of each 
5 Macro-block in a Video Segment, describes a method of modifying the quantization step of 
each macroblock in a video segment. As shown in the accompanying Figure 1, a video 
encoding system includes an activity detector 104 that detects the picture activity of each 
8x8 pixel block and classifies liie blocks by class number. Notwithstanding its name, a 
data estimation circuit 108 calculates the exact number of bits generated by the data in the 
10 segment memory 103 by quantization, run-length and Huffinan coding, given a 
quantization number. A first quantization step decision circuU 106 determines a segment 
wide quantization number, and a second quantization step decision circuit 107 modifies the 
quantization number for each macroblock so that the quantity of quantized data is below a 
predetermined bit budget. A quantization circuit 105 quantizes the data with the resulting 
15 quantization number and a variable length coding (VLC) circuit 110 encodes the data by 
run-length and Huffman coding. 

A paper by S. Rhee et al., A New Quantizer Selection Scheme for Digital VCR, IEEE 
Transactions on Consumer Electronics. Vol. 43, No. 3. Aug 1997, discloses a method of 

20 determining the quantization and class numbers to select a quantizer for each 8x8 pixel 
block. A modified quantizer map, QID, &om a reduced set of quantization vectors was 
introduced. A segment-wide QID was first selected by calculating data quantity through 
quantization and variable length encodmg. The selected QID was then mapped to the 
respective quantization and class numbers. The quantization vector for each 8x8 pbcel 

25 block was fine-tuned by adjusting the class number accordmg to the calculated data 
quantity. 

A paper by W. Ding and B. Liu, Rate Control of MPEG Video Coding and Recording by 
Rate-Quantization Modeling, IEEE transactions on Circuits and Systems for Video 
30 Technology. Vol. 6, No. 1, Feb 1996, describes controlling the video bit rate by using a 
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feedback re-encoding method with a rate-quantization model. This rate-quantization 
model was ad^ted with re-encoding results to choose a suitable reference quantization 
parameter for meeting a target bit rate. 

5 nie difficulty in controlling bit rate lies in how to choose the quantization parameters to 
meet a target bit rate budget Prior art methods have attempted to solve this difficulty by 
trying out selected combinations of all possible quantization parameters. However, these 
methods require complex hardware and/or significant computational overheads. They 
require a multi-pass raiplementation of the processes of quantization and variable length 

10 encoding. 

Even rate-quantization modelmg is dependent on either re-encoding or training sequences 
and classification schemes. The former has the disadvantage of local adaptation with 
quantization parameter and the requirement of two to three-pass encoding, while the latter 
15 is unpractical for real-time video transmission and quality control, hi addition, the rate- 
quantization model has only been used on a firame-basis. There may be a model mismatch 
for finer bit estimation due to the fast .changmg nature of the rate-quantization model at 
low bit rates. 

20 It is desired, therefore, to provide a mediod for use in encoding video data, a video 
encoding module and a video encoder that alleviate one or more of the above difficulties, 
or at least a useful alternative to existing methods, modules and encoders. 

SUMMARY OF THE INVENTION 

25 In accordance with the present invention there is provided a method for use in encoding 
video data, including generating metric values for said video data based on a metric 
fonction and respective encoding parameters, and selecting at least one of said encoding 
parameters on the basis of a desired quantity of encoded video data and a predetermmed 
relationship between metric values and respective quantities of encoded video data. 
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Preferably, said metric function is based on AC coefficients of xKscrete cosine 
transformation data generated firom said video data. 

Advantageously, the metric function may be a spatial activity metiic based on a sum of 
5 weighted AC disarete cosine transformation coefficients. 

Advantageously, the metric function may be based on the number of non-zero AC discrete 
cosine transformation coefficients after quantization. 

10 The present invention also provides a video encoding module having components for 
executing the steps of any one of the preceding methods. 

The present invention also provides a video encoding module, includmg a predictor 
module for determining estimates for the quantity of encoded video data using respective 
quantization vectors, and a selector module for selecting at least one of said quantization 
1 5 vectors on the basis of said estimates. 

The present invention also provides a video encoding module, including a predictor 
module for determining estimates for bit rate values representing fhe quantity of encoded 
video data using respective quantization vectors, a selector module for selecting two of 
said quantization vectors on the basis of said estimates, first quantization and variable 

20 length coding modules for generating first encoded video data using a first of said selected 
quantization vectors, second quantization and variable lengfli coding modules for 
generating second encoded video data using a second of said selected quantization vectors, 
and an output decision module for selecting one of said first encoded video data and said 
second encoded video data for output on the basis of at least one of the bit rate value of 

25 - said first encoded video data and the bit rate value of said second encoded video data. 

The present invention also provides a video encoder, including any one of the above video 
encoding modxdes. 
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The present invention also provides a digital video (DV) encoder, including any one of the 
above video encoding modules. 

The present invention also provides an MPEG encoder, including any one of the above 
video encoding modules. 

5 Instead of calculating the data quantity to be generated by actually encoding the video data, 
preferred embodiments of the invention use a predictor to estimate the data quantity. The 
predictor uses a metric for the process of quantization, run-lengtii and Huffinan coding, to 
predict an output bit rate or data quantity. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

Preferred embodiments of the present invention are hereinafter described, by way of 
example only, with reference to the accompanying drawings, wherein: 

Figure 1 is a block diagram of part of a prior art DV encoder; 

Figure 2 is a block diagram of a first preferred embodiment of a video encoder; 
15 Figure 3 is a block diagram of a first preferred embodiment of a predictor of the 

encoder; 

Figure 4 is a flow diagram of a process executed by the predictor to determine 
sjxctp firom AC DCT coefficients; 

Figure 5 is a graph of a power law function used by the predictor to represent the 
20 relationship between the number of bits generated by quantization and variable length 
coding and s_acU Hie sum of spatial activity metric values; 

Figure 6 is a block diagram of a second preferred embodiment of a predictor of the 
encoder; 

Figure 7 is a flow diagram of a method executed by the predictor for determining 
25 sym^ the number of non-zero AC DCT coefficients after quantization; 

Figure 8 is a schematic diagram of a comparator of the predictor, used to compute a 
6 bit flag for determining sym\ 
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Figure 9 is a schematic diagram illustrating the addition of accumulator cell values 
in the predictor for a selected qiiantization vector; 

Figure 10 is a graph of a power law function used by the predictor to represent the 
relationship between the number of bits generated through the process of quantization and 
5 variable length coduig and ^ym; 

Figure 1 1 is a block diagram of a second preferred embodiment of a video encoder, 

mcluding feedback control; 

Figure 12 is a block digram of a third preferred embodiment of a video encoder, 
including two-pass encoding predictive rate control with or without frame-based feedback 
10 control; 

Figure 13 is a block diagram of fourth prefored embodiment of a video encoder, 
based on sym values and with feedback control; and 

Figure 14 is a schematic diagram illustrating a finite unpulse response module of 
the encoder of Figure 13. 

1 5 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Bit rate control involves determining a suitable set of encoding parameters for generating 
encoded video data with a desired bit rate from input video data. Encoders of the preferred 
embodiments described below each provide a predictor to estimate the quantity of data 
produced by encoding given input video data. The difficulty of achievmg this is apparent 

20 from the observation that the relationship between the quantity of encoded data and 
quantization parameters varies for different macroblocks. different video segments, 
different parts of the same sequence and even different kinds of sequences. Consequentiy, 
tiie preferred embodiments provide metric functions, as described below, to determine 
metric values from input video data that correlate with the actual quantity of output video 

25 data generated by encoding the input video data. An estimate for tiie quantity of data is 
then determined from a predetennined relationship between metric values and bit counts. 
TTie relationship is determined experimentally for a given metiic function and reference 
video data. The predictor generates a metric value from input video data using tiie metiric 
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function, and then converts the metric value to a bit count estimate using the predetermined 
relationship. 

To determine suitable metric functions, the three processes of quantization, run-length 
5 encoding and Huf&nan encoding have been considered. A close mspection of the Huf&nan 
table reveals that longer codewords are associated with longer run lengths and also with 
larger amplitudes. The reason for this is that long run lengths and high amplitudes occur 
with lower probabilities. Consequently, an 8x8 pixel block with many low amplitude and 
short run codes can generate a total codeword of comparable length to anothar block that 
10 has few high amplitude and long run codes. Hence the spatial activity of the block is a 
good measure of the codelength. 

A conventional metric function for analysing the contents of a pixel block using the AC 
energy or variance, T /(«,v)' , where f(u.v) is the DCT coefficient of the AC block 
15 element with coordinates (u,v), is insufficient. However, a sUghtly modified spatial 
activity metric, desaibed by 

can be used to estimate codelength. This metric, the sum of the weighted AC coefficients 
in an 8x8 pixel block, where w(u,v) are the weights applied to the AC coefficients in DV 
20 systems, can be correlated with the quantity of data generated by variable length coding. 
By dividing this metric by the quantization step size used, a block spatial activity metric 
that also incorporates the influence of quantization is obtained. This block spatial activity 
metric, 

25 provides a link between quantization and the number of bits of encoded data, where q(u.v) 
refers to the quantization step size used on element (u,v) of the block. 
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The metric may be further extended to include the spatial influence ofihc AC coefficients 
in the block. The further the AC coefficient is from the DC coefficient in terms of run- 
length coding, the better the chances of a longer nm, given the characteristics of the 
discrete cosine transform. Hence a longer codeword is expected for an AC coefficient far 
5 from the DC coefficient This extended metric is given by 



where h(u,y) is a spatial weighting on the AC coefficients and is determined 
10 experimentally. 



The spatial activity metric functions act and acip described above can be independently 
used to generate a metric value from input video data and a given quantization vector that 
correlates with the actual quantity of data generated by encoding the input video data using 

15 the same quantization vector, and hence the bit rate. By generating a series of these values 
for respective quantization vectors, a general relationship can be determined between the 
quantity of encoded video data and either one of the spatial activity metrics, using 
reference input video data. Although this relationship is determined from reference video 
data, it can be applied in a quasi-universal manner to any input video data to determine an 

20 accurate estimate for the quantity of encoded data or the bit rate. The estimate enables 
input video data to be encoded with simple single-pass quantization and variable length 
encoding steps to generate encoded video data having a bit rate close to a desired bit rate. 



The preferred embodiments of the invention are described in relation to the DV standard in 
25 order to select quantization vectors for 8x8 pixel blocks so as to maintain a constant bit 
rate. However, it will be appreciated that the invention can be applied in a more general 
sense to select encoding parameters to provide the closest match to any (e.g., variable) 
target bit rate or data quantity. 



J 



wo 03/056839 



PCT/SGOl/00261 



The choice of an appropriate quantizer is important for DV, because excess bits that are 
generated through the quantization and variable length coding processes are dropped. The 
DV encoder therefore ensures that the number of bits of encoded data representing the AC 
coefficients of each five-macroblock video segment (vs) is kept below a predetermined bit 
5 budget: 

5 

^bitSi <.targetbits^^ 
bits.^fiQV,) 

where bitst is the number of bits generated from the variable length coding of the 
macroblock of the video segment, and targetbitSvs is a predetermined bit budget for the 
video segment For example, the bit budget may be 2560. A set of five quantization 
10 vectors is to be foimd for the video segment, one for each macroblock, and QVi refers to 
the quantization vector used for macroblock i. 



The quantization problem in DV is complicated by the use of different quantization steps 
for each of the iova different areas of an 8x8 pixel block. For application of the actp 
15 metric to DV, the sum of the spatial activity metrics for a macroblock can be written as 

s_actp,=2_,2L— -^Kj) > 

where /, j\ k denote, respectively, tiie index of the macroblock in the video segment, the 
index of the Jt* 8x8 pfacel block in that macroblock, and the area number/ The magnitudes 
\ACjj,\ of the AC coefficients in a particular area Jk of an 8x8 pixel block are summed and 

20 the sum is then divided by the common qxiantization step qjk used for that area. The result 
is multiplied by a weight hO) relating to the emphasis of the AC coefficients of that 
particular area. For example, the weights for the respective areas may be in the ratio of 
{1,2,4,8} for respective areas {0,1,2,3}. 



25 



A digital video (DV) encoder, as shown in Figure 2, includes a discrete cosine 
transformation (DCT) mode decision module 402, a weighted forward DCT (FDCT) 
module 401, a segment memory 403, a predictor module 406, a selector module 405, a 
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quantization module 404, and a variable length encoding module 407. The DCT mode 
decision module 402 analyses each 8x8 pixel block based on edge and or inter-field motion 
detection, as described in P.H.N. de With, Motion-Adaptive Intraframe Transform Coding 
of Video Signals, PhiUps Journal of Research, Vol. 44, No. 2/3, pp 345-364, Apr 1989, to 

5 select a suitable DCT mode, either 8x8 or 2x4x8. The weighted FDCT module 401 
performs a two-dimensional discrete cosine transform, followed by a weighting multipUer. 
A video segment of data, consisting of five macroblocks firom different locations of a 
frame, is stored in the segment memory 403 for fiirther processing. Each macroblock of 
the video segment consists of four luminance 8x8 pixel blocks and two chrommance 8x8 

10 pixel blocks. 

The video encoder controls the bit rate of an encoded data video segment by determining 
predicted values or estimates for the quantity of encoded data for each macroblock within 
the segment for different quantization vectors, and selecting a quantization vector for each 
15 macroblock to obtain the closest bit rate to a desired target bit rate. 

In the first preferred embodiments, predicted bit counts are determined by methods based 
on block spatial activity and executed by the predictor 406. A first preferred embodiment 
is first described, in which the predictor 406 executes a method based on the spatial 
20 activity fimction actp. A second preferred embodunent is then described in which tiie 
predictor 406 uses the simpler spatial activity fimction act, as indicated. However, because 
these embodiments differ only in the form of the spatial activity metric fimction used, 
much of the description below applies equally to both embodiments, as indicated. 

25 As shown m Figure 3, tiie predictor 406 includes an absolute block 201, a quantizer block 
203 an accumulator 204, and a spatial activity metric predictor 205. For tiie actp metnc, 
the predictor 406 also includes a spatial emphasis block 202. The predictor 406 is a 
dedicated hardware circuit, as described below, or a processor such as an STIOO or ST200 
processor manufectured by STMicroelectronics, in which case tiie blocks 201 to 205 are 

30 software modules executed by the processor. The otixer components of tiie encoder are 
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standard DV components known in the art, such as those of a Divio™ NW7G1 single chip 
CODEC. 

The predictor module 406 models the process of quantization and variable length coduag 
5 using the actp metric function. It calculates the s_actp values as described above, and then 
predicts the number of bits generated fix)m a macroblock for all possible quantization 
vectors, using an empirically detenmned relationship between s_actp and bit count, as 
described below. 

10 For a given set of quantization parameters, the bit count contribution from an 8x8 pixel 
block is estunated by the predictor 406 of Figure 3 as follows. The absolute block 201 
reads in the DCT coefficients for the block sequentially and then outputs their magnitudes. 
For the actp metric, where spatial influence is taken into account, the spatial emphasis 
block 202 multiplies each coefficient by a weight based on the position of the coefficient in 
1 5 the pixel block. The quantizer block 203 then quantizes the coefficients with respect to the 
quantization step size. However, the quantization and spatial emphasis steps can 
alternatively be combined and reduced to shift operations, as described below. The 
accumulator 204 sums the quantized, weighted coefficients to determine the act or actp 
metric value for the pixel block, as appropriate. The metric value for a macroblock is 
20 determined by summing the act/actp values of its six constituent pixel blocks. The spatial 
activity metric predictor 205 derives an estimated bit count for the macroblock based on 
the summed act/actp values and an empirically determined relationship between metric 
values and bit counts. This relationship determines the conversion between metric values 
and bit counts, and can be implemented in the spatial activity metric predictor 205 by 
25 calculating a mathematical function, as described below, or, to improve efficiency, as a 
non-linear look-up table. The process of Figure 2 is repeated for all quantization vectors to 
determine a bit count estimate for each quantization vector. However, the repetition may 
be minimised or even eliminated by identifying a basic set of quantization vectdfs, 
determining act/actp values for these basic quantization vectors and deriving the remaining 



wo 03/056839 



PCT/SGOl/00261 



-12- 



act/actp values from the basic act/actp values with shift operations tiiereof, as described 
below. 

Different combinations of parameters such as the quantization and class numbers can be 
5 used to generate an indexed set of possible quantization vectors, as shown in Table 1. 
Several of these vectors are multiples of other vectors in the set, allowing them to be 
classified into four groups, with the members of each group being multiples (by powers of 
2) of a conunon quantization unit vector for the group, as shown in Table 2. A set of 

four basic quantization vectors b , can then be defined for the respective groups, as shown 
10 in Table 2, as the vector quotients of a weight vector w representing the weights hO) 
applied to different areas of an 8x8 pixel block, as described above, and the four 
quantization unit vectors . 



Table 1 



Index 


Quantization Vector 


0 


1111 


1 


1112 


2 


1 1 22 


3 


2222 


4 


2224 


5 


2244 


6 


2448 


7 


4488 


8 


48816 


9 


8 8 1616 


10 


8 16 16 32 


11 


16 16 32 32 



15 
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Table 2 



nnnntization Vector 


Quantization 
Unit Vector 


Shift Value 

X 


Basic Vector 
b=w/q„, 
M>=ri2 4 8] 


1111 


1111 


li 
1 


1248 


2222 
1112 


1112 


0 
1 


1244 


2224 
1122 


1122 


0 


1224 


2244 




1 




4488 




2 




8 8 16 16 




3 
4 




16 16 32 32 
2448 


1224 


1 


1 122 


48 8 16 
8 16 16 32 




2 
3 





Ite steps of a process executed by the predictor 406 for predicting bits at the macroblock 
level using the basic vectors b .are shovm in Table 3. IHe actp values of an 8x8 pixel 
5 block for all four basic quantization vectors b are first determined, as described below. 
Then the actp values for all quantization vectors, QVs, are derived fi:om the basic values 
through shift and addition operations, as described belov.. THe actp values for a 
macroblock are then determined by summing the corresponding actp values of the s« 
constituent pixel blocks. Tte corresponding bit count estimates are then determmed at the 
1 0 macroblock level from the predetermined relationship between s_actp and bit count 

Table 3 
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The predictor 406 includes a simple and efficient hardware circuit to determine sjxctp. A 
process executed by the hardware circuit for determining s_actp for all possible 
quantization vectors of a 8x8 pixel block is shown in Figure 4. At step 501, the 
magnitudes of the AC coefficients are summed separately according to area number to 
5 form four values sumo-3. At step 502, each of the four sums is shifted left by a respective 
element of the basic vector 6 , and the four resultmg values are stored in tmpo-3. For 
example, if the quantization vector is [8 16 16 32], the quantization unit vector q„ is 
[12 2 4] and the corresponding basic vector from Table 2 is then [1 1 2 2], or 
[2°, 2°, 2\ 2']. The sum for area 0, sumo, is then shifted left by 0. and sumj, suma, and 
10 sums are shifted by 0, 1, and 1 bits, respectively. These shifted sums are added together to 
generate basic s_actp values at step 503. The s_actp values for an 8x8 pbcel block are then 
found at step 504 by shifting x bits to the right from the basic s_actp values using the x 
values given in Table 2. The final value of s_actp for a macroblock is determined by 
processing six such pixel blocks and summing the s_actp values for each block. 



15 



•nie set of bit count estimates for each macroblock determined by the predictor 406 are 
processed by the selector module 405 to choose an optimal quantization vector for each 
macroblock, such that the sum of the predicted numbers of bits for the macroblocks of a 
video segment is less than the predetemiined bit budget. A slightly lower bit budget can be 
20 used for this purpose to accommodate prediction errors that might otherwise result in tiie 
dropping of AC coefficients. 

m quantization module 404 then quantizes the AC coefficients of the video segment 
stored in the segment memory 403 using the selected macroblock quantization vectors. 
25 THese QVs are then mapped back to the corresponding quantization step size (QNO) and 
class number (CNO) using standard DV relationships, as shown in Table 4. to provide 
these parameters in the output bit stream, as required by the DV standard. 
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Table 4 





Class Number 


Area Number 






CNO 






ANO 






0 


1 


2 


3 


0 


1 


2 


3 




15 








1 


1 


1 


1 




14 








1 


1 


1 


1 




13 








1 


1 


1 


1 




12 


15 






1 


1 


1 


1 




11 


14 






1 


1 


1 


1 




10 


13 




15 


1 


1 


1 


1 




9 


12 


15 


14 


1 


1 


1 


1 




8 


11 


14 


13 


1 






2 




7 


10 


13 


12 






2 


2 


Quantization 
number 
QNO 


6 


9 


12 


11 






2 


2 


5 


8 


11 


10 




2 , 


2 


4 


4 


7 


10 


9 




2 


2 


4 


3 


6 


9 


8 


2 


2 


4 


4 




2 


5 


8 


7 


2 


2 


4 


4 




1 


4 


7 


6 


2 


4 


4 


8 




0 


3 


6 


5 


2 


4 


4 


8 






2 


5 


4 


4 


4 


8 


8 






1 


4 


3 


4 


4 


8 


8 






0 


3 


2 


4 


8 


8 


16 








2 


1 


4 


8 


8 


16 








1 


0 


8 


8 


16 


16 








0 




8 


8 


16 


16 



The variable length encoding module 407 scans the quantized data in zigzag order 
5 according to the selected DCT mode of each 8x8 pixel block. The data is then run-length 
coded and the run-length codes are translated to variable length codes using a standard 
Huf&nan table. The encoded data is combined with the DC coefficients and its respective 
headers. The data is then re-arranged in a particular format specified by the DV 
specifications for output. 
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In an alternative embodiment, the video encoder can be implemented using the simpler act 
metric described above. The sum of spatial activity metric for a macroblock is then 
defined as: 



6 3 SUC^ 



5 where i. j, k denotes specifically the macroblock in the video segment, the A* 8x8 pixel 
block in that macroblock and the area number/. The magnitude of the AC coefficients in a 
particular area of an 8x8 pixel block is sunamed and then divided by the common 
quantization step used in that area. In this embodiment, the predictor module 406 models 
the process of quantization and variable length coding using the s_act metric. It calculates 

10 the s_act values and then predicts the number of bits generated for all possible quantization 
vectors for a macroblock using an empirically determined relationship between s_act and 
the number of bits, as described below. S_act can be determined using a set of basic 
quantization vectors derived through a method similar to that described above for 
determming s_actp, but by considering only the quantization vectors. 

15 

As described above, the relationship between metric values and the numbers of bits of 
encoded data can be hnplemented as a mathematical function or as a non-linear lookup 
table. The predictor 406 determmes a bit count estunate from a metric value using a 
mathematical power law function. For example, the relationship between bits, the number 
20 of bits of encoded data, and s_act, the sum of spatial activity metrics of act, is represented 
by the equation: 

bits +d = a*is_act+c)'' , 



25 w4iere a, b,c and dare adjustable parameters, witiib<l, as shown m Figure 5. Example 
values are a=195, b=0.35, c=160 and d=1152. 

Sunilarly, tiie relationship of the flc(p metiic and tiie number of bits is represented by the 

equation: 
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bits+d = a*is_actp + c)'' , 

where s_actp refers to the sum of spatial activity metrics of actp, and a, b, c and d are the 
parameters of the optimal curve, with b<l. Example values corresponding to the weights 
5 {1,2,4,8} are a=20.8, b=0.53, c=15.5 and d=88.9. 

The parameters for either equation are initially determined experimentally from a generic 
or reference video sequence by a calibration process. The calibration process is executed 
by a calibration system that is not part of the video encoder. The calibration system 
10 includes a processor and associated software modules having code for executing the steps 
of the calibration process. The equation parameters determined by the calibration process 
are stored m memory of the predictor 406 during manufacture. 

The steps of the calibration process are as follows: 
1 5 (i) The weighted DCT coefficients of each macroblock in a single picture frame of 
the sequence are quantized by all possible quantization step sizes. 

(ii) The quantized data is variable length encoded to fmd tiie actual numbers of bits 
generated from the video frame for each quantization step size. 

(iii) The quantization step size for the bit rate closest to a target bit rate is selected. 
20 (iv) The sum of the metric values generated from the blocks quantized by that 

selected step size is calculated to determine a data point comprising the number 
of bits generated and the corresponding metric value for the selected frame, 
(v) Further data points are determined by repeating steps (iii) - (iv), for a range of 
different target bit rates. 
25 (vi) The best-fit curve tiiat minimises the mean square error between the data points 
and the curve is found, ignoring outlying data points. A sUght overestimation 
is employed. For example, the best fit curve can be offset so that it Ues above 
80% of the data points. 



wo 03/056839 



PCT/SGOl/00261 



-18- 



(vii) The best-fit curve is used to estimate the bits generated for a selected 
macroblock of the reference video data for all possible quantization step sizes. 

(viii) A step size is selected to obtain a bit rate closest to a target bit rate. 

(ix) The quantized data is variable length encoded. 

5 (x) Steps (vii)-(ix) are repeated for all macroblocks in a short video sequence, for 
example 120 frames. 

(xi) The signal-to noise ratio (SNR) of the frames is calculated. For example, a 
peak SNR (pSNR) for an MxN frame is given by: 

pSNR=10*log,o mTO^T 7 

S Z mx,y)-f{x,y)-\' 

10 where 255 is the gray scale range, f(x,y) is the pixel (x,y) value of the original 

frame and fix.y) is the pixel (x,y) value of the encoded and then decoded 
frame. 

For a sequence of AT frames, the average of the pSNR values for all frames can 
15 be used as tiae SNR value. 

(xii) The process from step (vii)-(xi) is repeated by adjusting the curve parameters 
to maximise the SNR of the video sequence. 

The embodiments described above exploit block spatial activity metrics to determine a 
20 predicted value for the number of bits generated by video coding. However, these are not 
the only metrics suitable for this purpose. For example, the process of HufiBnan encoding 
was considered. Huf&nan coding is a form of entropy coding and is dependent on the 
occurrence probabilities for the different run-length symbols that are generated by run 
length encoding. It was observed that, through statistical averaging, the total codeword 
25 length converges, given a sufficient number of run-length symbols present. The number of 
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run-length symbols can be found from the output of the processes of quantization, run- 
length coding and partial Huffinan coding. The partial Huf&nan coding here refers to the 
splitting of a (run. ampUtude) code into two or more run-length symbols when an entry of 
run and ampUtude was not found in the Huffinan table. However, the actual process of 
5 repeated quantization and run-length coding is undesirable. 

It was found that the number of weighted non-zero AC coefficients after quantization, 
referred to as can be correlated with the number of data bits generated by encoding, 
the reason being that the Huf&nan table was designed for low occurrence of spUtting (run. 
10 amplitude) codes, and that the relative difference between the numbers of run-length 
symbols and non-zero AC coefficients would be small, given a sufficient number of run- 
length symbols used, "me number of weighted non-zero AC coefficients can be 
determined by counting the number of AC coefficients equal or greater than the 
quantization step size, the actual process of quantization is not necessary. 

In a fiirther embodiment, the predictor 406 of the video encoder of Figure 2 models the 
process of quantization and variable length coding using a metric fonction to determme 
sym values from input video data. The predictor 406 calculates sym values and then 
determines estimates for the numbers of bits of encoded data generated for all possible 
quantization vectors for a macroblock using an empirically detemnned relationship 
between sym and the number of bits. Hxe selector 405 then determines smtable 
quantization parameters by selecting the bit count estimate closest to a desired bit count. 
Tlxe encoded data is then generated by a single-pass quantization and variable length 
encoding process. 

It will be apparent that the process for selecting quantization parameters to obtain a desired 
bit rate usmg the sym metric is similar to the process described above using the spatial 
activity metrics. Consequently, only the significant differences will be described below. 



15 



20 



25 
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As shown in Figure 6, the predictor 406 of this embodiment includes a pseudo quantizer 
block 801, a non-zero AC coefficient counter 802, and a sym predictor 803. For a given 
set of quantization parameters, the bit rate of an 8x8 pixel block is estimated by the 
predictor 406 as follows. The pseudo quantizer block 801 reads in the pixel block 
5 coefficients sequentially and pseudo-quantizes fcem according to the quantization 
parameters. Pseudo-quantization, in the form of comparisons of the AC coefficients with 
quantization step sizes, is implemented rather than actual quantization. The non-zero 
coefficient counter 802 then processes the output of pseudo-quantization, and counts the 
number of non-zero quantized AC coefficients to determine sym. The sym predictor 
10 module 803 then generates an estimate for the bit count using the respective quantization 
parameters. This conversion of & sym value to a bit count estimate in the sym predictor 803 
can be implemented using a mathematical function or a uniform look-up table, i.e., with 
one bit rate value for each of the possible sym values. This process is then repeated for aU 
quantization parameters to determine corresponding bit count estimates. However, this 
15 implementation can be modified such that minimal repetition is needed, as described 
below. 

A summary of the process of predicting bits at the macroblock level in the predictor 406 is 
given in Table 5. The numbers of non-zero AC coefficients after quantization are first 
20 counted for different step sizes and areas. The sym values for each macroblock for all 
quantization vectors (QVs) are then derived from these intermediate values, as described 
below. The macroblock sym is then determined by summing the six constituent block 
values. The corresponding bit count estimate is then determined from the macroblock sym 
value using a predetermined relationship between sym values and bit counts. 
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Table 5 



Step 


Description 


No. of outputs 


1 


Calculate no. of non-zero AC coefficients for all 
blocks in a macroblock 


no. of step sizes * no. of 
areas * 6 


2 


Derive sym value according to step sizes in 
respective areas 


no.ofQVs*6 


3 


Sum sym values for a macroblock 


no. of QVs 


4 


Determine the corresponding bit count estimates 
from the sym model at flie macroblock level 


no. of QVs 



15 



The predictor 406 uses a simple and efficient hardware circuit to determine sym values. A. 

5 method for processing each 8x8 pixel block to determine sym values for all quantization 
vectors is shown in Figure 7. At step 506, each of the 64 AC coefficients of a DCT block 
' is passed through a comparator comprising a series of OR gates, as shown in Figure 8. to 
compute a respective 6-bit flag. The resulting flag bits fo-fi-fa^-fs represent Boolean 
values mdicating whether the corresponding 9-bit AC coefficient bo-b^ is greater than or 

10 equal to the quantization coefficients 1-2-4-8-16-32, respectively. At step 508, each bit 
of all the flags for one area number (ANO) is counted and the resulting sum is stored m 
accumulator cells. For every quantization vector q0-ql-q2-q3, the correspondmg 
accumulator cells are summed, as shown in Figure 9. lUe value of sym for Hie macroblock 
is accumulated through processing six 8x8 pbcel blocks. 

For example. Table 6 shows a typical 8x8 pixel block after forward discrete cosine 
transform and weighting. 
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Table 6 
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nxe processing of this block generates the accumulator cell values shovm in Table 7 and 
gives the output sym values shown in Table 8 for each quantization vector. 



Table 7 



Diiantizer/ANO 










1 


fo 








mmml 


2 


fi 










. 4 


fi 










8 


fs 










16 


U 










32 


fs 






Y-- vx'ir^ . .t . . . 





Tables 



Qviantization vector 


Output 


1 1 1 1 


5+13+19+12=49 


1112 


5+13+19+0-37 


1 122 


5+13+8+0=26 


2 2 22 


5+13+8+0=26 


2224 


5+13+8+0-26 


2244 


5+13+3+0-21 


2448 


5+9+3+0=17 



BEST AVAILABLE COPY 
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4488 


5+9+2+0=16 


488 16 


5+5+2+0=12 


8 8 16 16 


4+5+2+0=11 


8 16 16 32 


4+2+2+0=8 


16 16 32 32 


2+2+2+0-6 



Th. predictor 406 de«m>me. an «dma.. th. m^bcr of bi« of encoded d«a ftom the 
nonrber of non-«ro AC coeffioiente after quantization using the equation 

J bits + d = a* isym'^ci' , 

^here a. b. 0 and d are the panuneter. of fte optin,al curve, with b>l. a, sho^n to 
rie'o Exan^le values «e a.1.05. ^-1.28. ^80 and d.286.6. "me pararneters a» 
ZL ^ using a process sin^ »> .hat described above fb. «.e spa.^ 

10 activity metrics. 

in further al.«native embodiments, the use of any of the nre«ics described above ^ 
adapted v^thf^^f^berenhance^^perforn^nce of the encoder^ln^^^^^ 
e^^dinrents. ^ error between acmal bi. generation and the prcdrc^d b^. f^ a 
,5 n>acrob>oc.v*hi.sassocia.ed,uan.izationvectorisf^bac.i..«afe^U^^^^^^^^ 

as sho™ hr Figure U. A predictor nKXiule 706 is fl.e same ^^^ ^^ S^^ 
2 except that it receives and uses feedtack data torn the feedback mo<Me 408. Sunrlar y. 

L^^tion module U 04 is the same as the quantization module 404 but op.on^ 
^rfeedback data f^m the feedback module 408. A selector 1.05 is the s»e . ^e 
,0 Xtor 405. bu, outputs the selected data qu^tity estimate ,o a difference module UOO. 
rdLencemodlnOOdetern^esanerrorvaluerepresenting^ediff^^ 

«^ estimate and ti,e corres^ndiug actual quantity of encoded data ^ ^^eTfr^ 

^ „ , 408 to determine a feedback error tor eacn 

407 This is used by the feedback module 4U8 to aexemmi 

407. IS y frame-based, video segment-based 

outDUt macroblock. The feedback module 4U5 can „ fr„„,p 

,fr„„,e based feedback module, the errors made ma frame 
25 and/or macroblock-based. In a frame-basea leeu 



wo 03/056839 



PCT/SGOl/00261 



-24- 



for each. «c.or^value(depeudinguponwMchn.etricisbeingused^^ 
Lfeeab;k.odule408anastorea. At the end of each .a.e encod^n^^^^^ 
averaged ^ mean error for each s_act (or sym) value is used to adjust ^^.P^^^^ 

Jil t ti.e look-up table or the equation parameters stored in memory associated 
model 0.e., the iooK up la ^^^^ ^^^^^ 

tv,*> nredictor 406) in the predictor module 706. for exampic, tu 

the predictor 406) P ^^del is adapted on a frame 

to the equation parameter rf. This ensures mai u f 



basis. 



10 



, aaa.on, the feedba* module 408 can be used » p-vlde feedbaC fo. ^video 
T.C e^rs «ad. a video seg,ne« arc accu^ula^. averaged and stored aa a 
local adivurtment for the next segment prediction, according to: 

^ 5 refers «. the v^gb. of ti« error of the previous video segnren. on 
rs^ar^err^istheme^terrorvalueofaprevions Video segment Ane^npie. 



15 5=1.0. 



20 segmentiscalc„la.edbytheieedbackn.odde408acc«dingto: 



„.en«n*ersofe^bitsareabovea«^ho,dva.ue.a.^,u«^ 
23 usedf^^-ren^iningn^obloc^tohecodedinthesam e^^^^ 
be used, and the ^on vector can be mcrem^^ ™» 
fl^old.. Theteesholdaaredetemnnedemp.ncally. For example, 
2560 and 2600 may be nsed, v«fh incremental values of 1 and 2. 
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In yet another embodiment, two-pass encoding is used to improve the signal-to-noise ratxo 
(SNR) of video frames. As shovm in Figure 12, the video encoder of this embodnnent 
includes additional quantization 704 and VLC 707 modules, and an output decision 
module 709. IHe encoder mcludes a selector module 705 that is the same as the selector 
5 module 1105. but with two sets of outputs, one for each encoding pass. Two identical 
difference modules 1100. 1102. send error values to a feedback module 708 that is the 
same as the feedback module 408. but has two error data inputs. In this embodiment, the 
selector module 705 selects the two optimal quantization vectors such that the bit bud 

not exceeded. 

tte first quantotion mod,de 404 and variable length cotltag module 407 encode «.e 
segment data by the coarser selected quantization vector to generate macroblock output 
Mtsie«'); ,. Tta second quantization module 704 and variable length coding module 707 
encode the segment data by the finer selected quantizarion v«*,r to generate macroblock 

15 ou^utbits ^en^}ti- 

The steps executed by the output decision module 709 are as follows. If 4e total number 
of output bits ftom the firs, quantization modtfe 404 and variable lengflt coding module 
407. <„-ite";. a« 4en the bits are ou,uL Othervrfse. the 

20 macroblock are sorted order of priority; for example, in order of descendmg c<»^ 
of the quantization vector. For the first macroblock j. the difference of «>e btts of the two 
quantizafion vectors, -/e^J -Ze^i. is addedto tvs. tt*e added sum ..is less than me 
bit budget, then the pardcular macroblock ,uantiz«l by finer step size is nansnutted, and t, 
is updated. Ibe quantization and class numbers are «ljus.ed accordingly. Otherwtse. to 

25 macroblock with 4e coarser quantization is transmitted. The process is repeated for fl« 
remainder of the four macK>blocks until the entire video segment is transnntted. 
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Altematively. the error between the actual and estimated bits for the macroblock quantized 
by two step sizes is fed back, and the predictive model is adjusted; for example, usmg a 
frame-based feedback module 408. The two-pass concept can be further extended for 
three-pass encoding if necessary. 

As an alternative to generating bit count esthnates for different step sizes, a target metric 
value ie.g., a desired sym value) corresponding to the target bit count can be determined 
from the relationship between metric values and bit counts. The process of selectmg a bit 
count estimate closest to a target bit count then becomes a process of selecting a metnc 
10 (. g., sym) value closest to a target metric value. This method is employed, for example, 
by the sym-hascd video encoder shown in Figure 13. A sym analyzer 1006 calculates sym 
values for different quantization vectors. The sym values are then transmitted to a selector 
1005 me ratios of the actual numbers of bits generated from the output of Ihe variable 
length encoding block 40.7 and the corresponding sym values are determined and stored for 
15 a number n of video segments by ratio modules 1004, 1006. A finite impulse response 
(FIR) module 1008, as shown in Figure 14, then determines a ratio for an incommg video 
segment as the sum of the weighted ratios for the n previous video segments. The current 
target sym value corresponding to the target bit count is then determined by dividing the 
latter by the FIR ratio and fed as an input to the selector 1005. The selector 1005 selects a 
20 quantization vector such that the ^ value for the video segment is less than the target 
sym value. The data is then encoded using Ihe selected quantization vector by the 
quantizer block 404 and the variable length encoder 407. A two-pass topology, as 
described above, can be adapted here also. 

25 The embodiment deseribed above relate «> digital video. «i«reir. the total mm.ber of bto 
generated by encoding the maorobloeks of a vid«, sesnent are vdthin a predetermmed brt 
budget For tbe ca.e of MPEG.2 [ISO.IEanCl/SC29/SC29/WGn. "Test Model 5 
Draft. Apr 1993] rate control, it will be readily be appreciated that embodimet^s of the 
invention can be appUed to select an appropriate macroblock reference qnanti^on 

30 parameter such that the actual encoded bits fbUow the allocated target bits of a ftame 
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closely. The bits to be generated by a firame picture are estimated for all different 
quantization step sizes involved. The optimal step sizes for the macroblocks are chosen 
such that the difference between the target and estimated bit use is minimal. Alternatively, 
in variable bit rate coding where consistent picture quaUty is expected, the invention can be 
5 employed to estimate the bit rate for a target quantization step such that, given some 
constraints on the bit rate, for example maximum, minimum or average bit rate, tiie target 
quantization steps with the least deviation from these constraints can be generated. 

Many modifications will be apparent to those skilled m the art without departing from the 
10 scope of tiie present invention as herein described witii reference to tiie accompanying 
figures. 



